Demographic and Citation Trends in Astrophysical Journal papers 

and Preprints 



Greg J. Schwarz 
(AAS Journals Editorial Staff Scientist) 
and Robert C. Kennicutt, Jr. 
(Editor-in-Chief, The Astrophysical Journal) 
Steward Observatory, University of Arizona, Tucson, AZ 85721 



Abstract 

We have used data from the Astrophysics Data Sys- 
tem (ADS), the American Astronomical Society (AAS), 
and the arXive electronic preprint server (astro-ph), to 
study the publishing, preprint posting, and citation pat- 
terns for papers published in The Astrophysical Journal 
(ApJ) in 1999 and 2002. This allowed us to track sta- 
tistical trends in author demographics, preprint posting 
habits, and citation rates for ApJ papers as a whole and 
across various subgroups and types of ApJ papers. The 
most interesting results are the frequencies of use of the 
astro-ph server across various subdisciplincs of astron- 
omy, and the impact that such posting has on the ci- 
tation history of the subsequent ApJ papers. By 2002 
72% of ApJ papers were posted as astro-ph preprints, 
but this fraction varies from 22—95% among the sub- 
fields studied. A majority of these preprints (61%) were 
posted after the papers were accepted at ApJ, and 88% 
were posted or updated after acceptance. On average, 
ApJ papers posted on astro-ph are cited more than twice 
as often as those that are not posted on astro-ph. This 
difference can account for a number of other, secondary 
citation trends, including some of the differences in cita- 
tion rates between journals and different subdisciplincs. 
Preprints clearly have supplanted the journals as the pri- 
mary means for initially becoming aware of papers, at 
least for a large fraction of the ApJ author community. 
Publication in a widely-recognized peer-reviewed jour- 
nal remains as the primary determinant of the impact 
of a paper, however. For example, conference proceed- 
ings papers posted on astro-ph are also cited twice as 
frequently as those that are not posted, but overall such 
papers are still cited 20 times less often than the average 
ApJ paper. These results provide insights into how as- 
tronomical research is currently disseminated by authors 
and ingested by readers. 

Keywords: sociology of astronomy — astronomical 
data bases: miscellaneous 

1 Introduction 

In 2001 The Astrophysical Journal (ApJ) considered 
a plan to post preprint versions of its accepted papers 
on the ApJ web site. As part of this planning we inves- 
tigated the degree to which ApJ authors already used 
existing preprint servers, in particular the Los Alamos 



arXiv/astro-ph service, when papers were posted in the 
review cycle, and other patterns of usage. Based on a 
preliminary study of ^300 ApJ papers published in 2001, 
we found that the fraction of papers posted on astro-ph 
was high (73%), but with a wide variation in this frac- 
tion across different subfields of astrophysics. We also 
noted distinct patterns in when papers were posted, with 
only 64% of such papers posted after the articles were 
accepted at the ApJ. 

As a result of this investigation we decided to make 
preprints of all accepted ApJ papers available on our 
website (subject to author permission), in order to make 
these papers available early to the segment of our reader 
community that was not making heavy use of the astro- 
ph service. These results also stimulated us to under- 
take a larger and more statistically robust study of pub- 
lication, preprint posting, and citation patterns of ApJ 
papers. At the heart of the new study is a database con- 
taining 1639 papers, equivalent to a full year of papers in 
the ApJ Part 1 (main journal), and more than five times 
the number of papers analyzed previously. The database 
also includes significantly more information about each 
paper, including preprint information from the astro-ph 
server, first author demographic data from the American 
Astronomical Society (AAS) membership directory, and 
citation data from the NASA Astrophysics Data System 
(ADS) database. 

In this paper we report our findings, with the re- 
mainder of this paper organized as follows. In § 2 we 
describe how the database was constructed, and how we 
categorized paper subjects and types to analyze demo- 
graphic trends across our author base. In § 3 we use 
these data to characterize the trends in publication pat- 
terns within the ApJ. In § 4 we analyze statistics on our 
authors, and in § 5 we look at trends in preprint posting 
for various subjects and author categories. We present 
citation statistics on the papers in § 6, and confirm the 
most interesting result of our study, namely that papers 
posted prior to publication on astro-ph are cited at ap- 
proximately twice the rate as those that are not posted 
prior to publication. We discuss and interpret these re- 
sults in § 7. 
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2 Database Construction 

All of the ApJ^ papers published during the latter 

halves of 1999 and 2002 form the core of the database. 
The 2002 data provided a check on any recent changes in 
demographic trends, particularly with regard to preprint 
posting, while the 1999 data allowed us to track citation 
trends for an extended period after those papers were 
published. There are 795 papers from 1999 and 844 pa- 
pers from 2002. 

For each paper a number of attributes were compiled 
from four sources. The first source was the electronic 
ApJ. Attributes recorded included obvious data like ti- 
tles, page lengths, dates that the papers were received, 
accepted, and posted, the number of authors, and the 
names and affiliations of the first authors. No informa- 
tion was gathered about coauthors other than their total 
number. Authors were designated as working at either 
U.S. or non-U. S institutions, based on their institutional 
address published in the paper. When multiple affilia- 
tions were given only the first was recorded. Each paper 
was also placed into one of seven siibdisciplines of astron- 
omy. The subdisciplines chosen were cosmology (C), ex- 
tragalactic astronomy (EG), Milky Way (MW), Galactic 
ISM (ISM), stellar (S), solar system (SS), and other (O). 
The solar system is mainly comprised of papers on solar 
astrophysics, plus a handful of papers on space physics 
and other system bodies. The "other" subdisciplinc in- 
cludes papers such as instrumentation, atomic and nu- 
clear process, and analytical and numerical techniques. 
These subdivisions arc somewhat arbitrary, and when 
a paper covered more than one area we assigned it to 
its primary category as best as we could. Following the 
classification scheme outlined in Abt (1993), each paper 
was also placed into one of the following classifications; 
theoretical papers containing essentially no observations 
(T), observational papers presenting new observations 
(O), papers reanalyzing or rediscussing previous obser- 
vations (R), and laboratory or instrumentation papers 
(L). The analysis of these ApJ attributes is given in § 3. 

The second data source was the AAS membership 
list provided by the AAS Executive Office, which in- 
cludes the names and AAS membership status of over 
10,000 current and former AAS members. This enabled 
us to determine the fraction of authors who were AAS 
members, and also compile age and gender information 
for those authors.^ We were able to determine the gen- 
der for all but 6% of the remaining authors, after an ex- 
haustive web search and consultations with colleagues. 
Results from analysis of these data is presented in § 4. 

The third data source was the astro-ph preprint 



^In order to keep the database as homogeneous as possible we 
only analyzed papers published in the ApJ Part 1 (main journal); 
papers published in ApJ Supplement Series and ApJ Letters were 
not part of this survey. Likewise, editorials and errata were not 
included. 

^These demographic data were only used to track statistical 
trends in the demographics of our author community; the 
confidentiality of information on individuals has been preserved. 



server^. Papers with preprints were identified by au- 
thor and title text searches on the astro-ph search page. 
When a match was found the astro-ph identification 
number, the dates of the first and last submission, and 
the total number of submissions were recorded in our 
database. The astro-ph submission dates were compared 
to the submitted and accepted dates of the correspond- 
ing ApJ article to determine where the paper was sent 
first. We then used this information to define four cat- 
egories of preprints. The "PreApJ" (Pre) group con- 
sists of a single preprint arriving at astro-ph near the 
time of the ApJ submission. Such papers were posted 
as preprints prior to peer review, and were never up- 
dated afterwards. Since 97-98% of published ApJ pa- 
pers undergo significant revision during the peer review 
process, this also means that the vast majority of such 
preprints differ significantly from the accepted and pub- 
lished articles. We defined a second "PostApJ" (Post) 
group, consisting of single preprint postings that were 
sent to astro-ph at approximately the same time as the 
ApJ acceptance date. Apart from minor changes made 
in copyediting and proofs, the scientific content of these 
preprints is essentially the same as that of the subse- 
quently published ApJ articles. A third category called 
"Updated astro-ph" (Up) consists of articles that were 
posted more than onc;c with astro-ph. Usually these are 
preprints that were first posted near to the time of sub- 
mission, and then were updated following peer review 
and acceptance at the ApJ. The final class, "Unknown 
astro-ph" (Unk) applied to rare cases where the astro-ph 
posting dates bore no discernable relation to the journal 
submission and peer review timelines; fortunately they 
represent less than 1% of the sample. § 5 presents an 
analysis of the preprint data. 

The last data source was ADS'*. We were particu- 
larly interested in using the ADS citation database to 
track the papers' impact, as measured by the number 
of citations (e.g., Kurtz et al. 2004, Pearce 2004). It 
is important to emphasize that the ADS citation data 
are not complete, but by 1999 all of the major journals 
in astronomy were included, so they should provide re- 
liable information on relative citation trends, with the 
exception of citation trends across subfields, where the 
impact of journals outside of mainstream astronomy may 
be significant. The bibcode for each paper (Schmitz et 
al. 1995) was used as the input for the ADS query. From 
the output we recorded the total number of citations, as 
well as the publication dates of each of the citing pa- 
pers. Our data include self-citations; it simply was not 
practical to filter these out among the thousands of ci- 
tations recorded in our database. However we did deter- 
mine the overall first author self-citation fraction for our 
sample (14.8%^), and found that it is relatively constant 

'^http: / / arxiv.org/ archive / astro-ph 
*http; / /adswww.harvard.edu/ 

^Where our definition of a self citation is when any of the 
authors of a citing paper is also the lead author of the cited 
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across the various subdiscipline and demographic cate- 
gories in our analysis. Therefore we are confident that 
the inclusion of self-citations does not influence any of 
the conclusions of our analysis. 

Table 1 lists all of the principal attributes used in this 
analysis, along with their abbreviations as they appear 
in subsequent tables and figures. 

3 Attributes of ApJ Papers 

We begin with a brief summary of how the cur- 
rent papers in the ApJ are distributed by type, subject 
area, authorship, length, and preprint submission frac- 
tion. This serves to update previous analyses published 
by Helmut Abt (references below), and provide a foun- 
dation for the other analyses that follow. As might be 
expected there was little or no significant change in these 
trends between 1999 and 2002, so in most cases we plot 
results for both sets of papers combined. The notable 
exception is in preprint postings, which have increased 
significantly over the past 5 years, and we present these 
trends below. 

The distribution of papers by subdiscipline and type 
are given in Figures 1 and 2, respectively. The most 
popular subdisiplines are extragalactic and stellar with 
about 28% of the total each. ISM and solar (system) pa- 
pers each constitute around 15% of the total. Cosmology 
represents less than 10%, but we have separated it out as 
a distinct category because these authors are among the 
heaviest users of the astro-ph server (below). The least 
numerous are the Milky Way and the "other" subdisci- 
plines, comprising less than 8% of the total combined 
(the fraction once was higher, but most observational 
papers on Galactic astronomy now are published in the 
Astronomical Journal). 

When grouped by type, the ApJ is roughly equally 
divided between theoretical and observational papers 
(43% and 47%, respectively). The fraction of observa- 
tional papers has increased by a few percent since 1999, 
but this may reflect the impact of several large space 
missions (e.g., Chandra X-ray Observatory), and the re- 
cent trend probably is not representative of a long-term 
change in the Journal. The rediscussion and laboratory 
papers have remained constant at about 8% and 2%, 
respectively. 

These data can be compared to the analysis of Abt 
(1993) to see how the breakdown has changed over the 
past 40 years. Abt's published data set included all the 
papers published in the first six months of 1962, 1972, 
1982, and 1992 for the ApJ (including Letters and Sup- 
plements), the Astronomical Journal, and the Publica- 
tions of the Astronomical Society of the Pacific. Dr. Abt 
kindly supplied us with his original data, so we could re- 
compile the type percentages for the ApJ main journal 
alone. To minimize fluctuations due to small number 
statisitics we also include papers published in the latter 



paper. This method only provides a lower limit but is consistent 
with other studies {e.g. Trimble 1986) 



half of 1962 to double the original 96 papers. Figure 
2 shows the results. Interestingly, the nearly equal di- 
vision between theortical and observational papers has 
persisted at the ApJ for at least 40 years. The other two 
classifications are also relatively constant. 

Table 2 presents a variety of other demographic 
data for our combined 1999/2002 sample, including total 
number of authors, paper length and acceptance time, 
summarized for the ApJ as a whole. Means, la standard 
deviations in the means, and median numbers are given 
in each case. In most instances there is a remarkable uni- 
formity in author habits across the range of disciplines 
represented by the ApJ. However there are notable ex- 
ceptions that we highlight below and in § 4. 

The average number of authors in 1999-2002 was 
4.2, and increased from 4.0 ± 0.1 to 4.5 ±0.2 over that 
3-year period. This continues a long-term increase doc- 
umented previously by Abt (2000). Interestingly, the 
median number has been growing more slowly with time 
(currently standing at 3), suggesting that much of the 
average increase is due to a growing subpopulation of 
papers with very large numbers of authors. There is 
also a pronounced distinction bc;tween theoretical and 
observational papers, which differ by roughly a factor of 
two in numbers of authors, whether measured by means 
or medians. This is not very surprising given the advent 
of large multi-user observational facilities and large sur- 
veys, but it does underscore the presence of an increas- 
ing gulf in the prevailing manners and cultures in which 
theoretical and observational research are conducted in 
astronomy. 

The lengths of ApJ papers also show a slow but re- 
lentless growth, reaching 11.5 pages in 1999/2002 and 
nearly 12 pages today. Again this follows a long-term 
evolutionary trend (Abt 1981). There is no significant 
difference across subfields or paper type, apart from pa- 
pers in the "Other" category, mainly laboratory or an- 
alytical spectroscopy papers, that tend to be somewhat 
shorter on average. Likewise there is little significant 
difference in peer review times (the time between initial 
submission and final acceptance of a paper) across sub- 
disciplines. The 35% decline in the 2002 data is due to 
the introduction of web-based peer review and tighter 
editorial controls on the peer review process. 

4 Demographic Trends Among First Authors 

The first author's age during the year of publication 
is provided in Table 3. The average ages are almost 40 
while the median is 37. There is an interesting anti- 
correlation between median age and subject area when 
measured in terms of distance from the Earth, doubtless 
a reflection of the evolution in research interests of young 
scientists. In addition, the median first-author age for an 
observational paper is 2-3 years younger than authors of 
theoretical papers. 

Other demographic trends arc summarized in Fig- 
ure 3. In 1999-2002 37% of ApJ first authors worked 
at an institution outside of the U.S.; the ApJ truly is 
an international journal. Interestingly, a minority of 
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first authors — only 45% — were active AAS members in 
2002, somewhat surprising given the fact that the ApJ 
is owned and administered by the AAS. Among all ApJ 
authors based in the U.S., only 63% were active members 
2002. 

Our data reveal some interesting patterns in publi- 
cation according to first author gender (see Figure 3 and 
Tabic 3). Among the 94% of all papers where the au- 
thor's gender could be established, 15.4% of first authors 
were women. This can be compared to the findings of the 
recent survey of women in astronomical institutions by 
Hoffman & Kwitter (2003) ^. The fractional representa- 
tion of women in that survey among postdocs, assistant, 
associate, and full professors is 20%, 17%, 15%, and 8%, 
respectively (with an average of 12% for all professors). 
The 15.4% representation among ApJ first authors falls 
in the middle of these numbers, and is roughly consistent 
with the average age of 37-40 for the author population. 

Closer examination of Table 3 reveals strong pat- 
terns in these percentages across subdisciplincs and pa- 
per types. Less than 9% of papers in cosmology are 
authored by women, less than half of the fraction among 
papers in extragalactic and Galactic astronomy and the 
ISM. Likewise less than 12% of theoretical papers overall 
are authored by women, compared to 18.5% of observa- 
tional and reanalysis papers. These are not second-order 
reflections of age demographics, as Table 3 will testify. 
They reflect systematic differences in participation of 
women among subdisciplincs, whether by choice or by 
retention. 

5 Preprint Demographics and Trends 

Table 4 summarizes the preprint posting habits of 
ApJ authors. In this case we have tabulated results for 
1999 and 2002 separately, to illustrate the increase in 
postings over time. By 2002 72% of all ApJ papers were 
posted as preprints at some time prior to publication. 
This fraction increased from 61% in 1999, though a more 
detailed look at the temporal trends suggests that this 
fraction has leveled off at 72-75% since 2001. It is no- 
table that a similar fraction of authors (^80%) elect to 
post their accepted ApJ manuscripts on our own web- 
site. The population of "non-posters" includes scientists 
working in fields where astro-ph is not widely used (be- 
low), and a smaller fraction of authors who choose not 
to utilize the preprint posting services at all. 

One of the pleasant surprises of this study (to us) 
was the way in which authors use the preprint posting 
services. A majority of ApJ authors (61%) did not post 
their preprints until the paper had passed peer review 
and been accepted for publication. The remaining au- 
thors posted their paper for the first time at submission, 
and the fraction of authors who post early increased sig- 
nificantly between 1999-2002. However an increasing 
portion of those authors went to the trouble of updating 



"The poster and the raw data is available at 
http ; / / www. r uf . rice . edu/j hoflfman/st at s/ . 



their astro-ph postings after acceptance; in fact only 11% 
of authors posted only their submitted version of an ac- 
cepted ApJ paper. Of course this says nothing about the 
astro-ph postings for papers that are rejected or other- 
wise remain unpublished, and the updates only are useful 
for the few astro-ph users who download and read the 
updated postings. But it is reassuring to know that there 
is ~90% correspondence between the preprints and the 
accepted manuscripts after the time of acceptance. 

Table 5 and Figure 4 show the same information but 
now broken down by subdisciplincs and paper types. 
Some trends are common to most fields, most notably 
the general increase in overall postings between 1999 
and 2002. More striking are the differences in preprint 
posting practices among different subfields. The posting 
rates are the highest in cosmology (95% of all published 
ApJ papers) and extragalactic astronomy (90%); in these 
fields nearly every significant ApJ paper first appears on 
the astro-ph server. At the other extreme is solar sys- 
tem (including solar astrophysics), where only 22% of 
papers are posted. This refiects a curious general trend 
between astronomical distance and preprint posting fre- 
quency. However even in the solar system category the 
usage of the server is increasing over time. 

Our data reveal other interesting demographic trends 
in the preprint posting habits of our authors. In most 
fields theoretical papers are posted more frequently than 
observational papers, though the distinctions are de- 
creasing over time. Authors who use astro-ph are sig- 
nificantly younger than authors who do not post on the 
server (median age 35 years vs 44 years, respectively). 
U.S. authors use astro-ph slightly more often than non- 
U.S. authors (62.5% vs 58%). Usage by male and female 
authors is the same within the statistical errors. 

Many of the differences in overall use of astro-ph 
across subdisciplincs are also mirrored in the posting 
patterns. For example in 2002 only 23% of cosmology 
preprints arc posted for the first time after peer review, 
compared to 61% for all ApJ papers. Cosmologists not 
only are the heaviest users of the system, they also are 
the quickest to post new results. Cosmology stands out 
uniquely in this regard; in every other subdiscipline we 
considered a majority of papers (57-80%) of papers were 
posted for the first time after acceptance (exceptions ex- 
ist in some small subfields, including gamma-ray burst 
observations and gravitational microlensing, where rapid 
release of data is especially important). There also is 
a sharp contrast between theoretical papers, which are 
posted early much more frequently than the other types 
of papers (53% vs -33% for observational, rediscussion, 
and laboratory papers). 

Many of these trends are understandable in terms of 
the different prevailing practices and subcultures within 
astronomy. Theoretical papers often are posted early 
so the authors can obtain independent feedback from 
colleagues during the peer review process, and in some 
cases with the intent of establishing scientific priority 
for new ideas. Observers tend to be more conservative, 
partly out of a desire to confirm the veracity of their 
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data before disseminating it widely, and in some cases 
to protect their proprietary interest in the data until the 
respective paper is accepted for publications. 

6 Citation Patterns 

In order to analyze the trends in citation rates for our 
papers we compiled citation data from the ADS database 
in the summer of 2003. In order to have a reliable time 
base over which to collect these data we restricted all of 
the citation analysis to the ApJ papers in the 1999 sub- 
set. Table 6 summarizes the mean and median number 
of ADS citations, averaged according to subdiscipline, 
paper type, and demographic category. These values are 
integrated over the entire citation lifetime of the paper 
up to mid-2003 (for reference, the average citation fre- 
quency for ApJ papers, during the first two years after 
the publication year, is 6.6 citations/paper/year^. 

6.1 Citation Rates Across Subfields and Paper 

Characteristics 

We were somewhat pleasantly surprised to find that 
citation rates of Ap.J papers do not differ very widely 
across subdisciplines or demographic categories. Some 
patterns are evident; citation rates are highest among 
the mainstream categories of astronomy (cosmology, ex- 
tragalactic. Galactic, stellar), where large numbers of 
papers are written, and systematically lower in the so- 
lar system and "other" categories, where the number 
of practicing scientists is much lower, and where the 
ADS citation data are likely to be more incomplete. An 
anomaly we do not understand is the significantly lower 
citation rate for papers in the Galactic/ISM subfield. 
There is also a small but significant difi'erence in the ci- 
tation frequencies for ApJ papers by U.S. and non-U. S. 
authors. Part of this is a second-order effect of the dif- 
ferences in citation rates by subfield alluded to above. 

Abt (1984) showed that the mean number of cita- 
tions increased linearly with both paper page length and 
the number of authors. Figures 5 and 6 show that these 
trends are still valid. The points show the mean citation 
rates (and standard deviations) as functions of paper 
length (Fig. 5) and number of authors (Fig. 6), while 
the histograms show the number distributions of papers 
by length and author number, with the respective scales 
given on the righthand axis of each plot. The number 
of single-author papers has steadly declined throughout 
the years, declining from 40% of all papers published in 
1974 (Abt 1984) to 13% in 1999. Over the same period 
the fraction of papers with more than 6 authors increased 
from 3% to 18%. The single-authored paper is becoming 
an endangered species! 

6.2 Demographic Trends 

Figure 7 shows the mean and the one sigma standard 
deviation of the mean ADS citations as a function of first 



'^Data compiled by the Institute for Scientific Information 
(ISI). 



author age (left ordinate) and the age distribution (right 
ordinate). If taken hterally the results suggest that the 
impact of an average astronomer's papers peaks during 
their 30's, but this peak is not statistically significant. 
What is significant is a very steep decline in citation 
frequency after age 50. This may partly refiect external 
factors such as subfield of interest and preprint posting 
practices (§ 6.3), but age itself clearly is important. 

The data also show some difi'erences in citation fre- 
quencies between papers by male and female first au- 
thors, but the patterns are inconsistent; while the aver- 
age citation rates for male first authors are higher, the 
median rates for male first authors are lower. After fur- 
ther investigation we found that the difference in mean 
rates is driven by a small number of very highly cited pa- 
pers that skew the averages. This is shown in Figure 8, 
which compares the normalized citation distributions for 
papers with male and female first authors. The distribu- 
tions for men and women are virtually the same (within 
statistical errors) until one reaches papers with 50 or 
more citations. Among the latter super-cited papers all 
but two have male first authors. 

With the limited sample; size wc; cannot discern for 
certain whether this this difference in very highly cited 
papers is a product of small number statistics or a gen- 
uine imbalance in the authorship of major, highly-cited 
papers. However we would not be surprised if part of the 
difference is real, because it would fall in line with other 
known demographic trends, such as the strong under- 
representation of women among the ranks of full pro- 
fessors and equivalent rank staff positions in astronomy 
(Hoffman & Kwitter 2003), and the strong (and disturb- 
ing) under-representation of women among the major 
A AS prize winners over the past two decades. Without 
a firmer statistical foundation we would caution against 
overinterpreting these citation patterns, but we intend to 
collect more data on these rates over time to ascertain 
whether the gender-based differences in citation patterns 
persist. 

Table 6 also compiles the citation frequencies as 
functions of the first author's AAS membership status 
and country. Papers by active AAS members and U.S.- 
based authors are cited ~30% more frequently than pa- 
pers by non-AAS members and non-U. S. authors. In this 
case much of the difference can be attributed to other 
factors, such as differences in subdiscipline distributions 
and the lesser liklihood that non-U. S. authors post their 
papers on astro-ph (§ 6.3). 

6.3 Effect of Preprint Posting on Citation Rates 

Table 6 also tabulates the citations separately for pa- 
pers that were posted on astro-ph and those that were 
not. These reveal the most interesting result of our entire 
study, namely that ApJ papers posted prior to publica- 
tion as astro-ph preprints are cited more than twice as 
often as papers that are not posted on astro-ph. This 
pattern persists across every subdiscipline and subcate- 
gory of paper we analyzed. 

How does one interpret this striking difference in ci- 
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tation frequencies? At first we speculated that it resulted 
from the longer visibility of a paper that was posted as 
a preprint. For papers published in 1999 there was a lag 
of nearly a year between average submission and publi- 
cation time (since reduced by 40%), so papers that were 
posted as preprints have a longer effective citation life- 
time. To test this hypothesis we tabulated the time his- 
tories of the citations for 1999 papers, and plotted them 
separately according to whether they were also posted 
on astro-ph. The results are shown in Figure 9. As ex- 
pected, the papers that had been circulated as preprints 
enjoyed a surge in early citations that was not mirrored 
in the papers that were seen for the first time in the Jour- 
nal. However the same plot shows that the difference in 
citation rates persists for more than 3 years after the 
ApJ paper is published. This cannot be an artifact of 
a longer "shelf life" of the preprints; instead it strongly 
suggests that at least half of the author community only 
becomes aware of other papers when they are posted on 
astro-ph. We discuss the implications of these findings 
further in § 7. 

We used the same data to determine whether cita- 
tion rates wore influenced by when an author posts their 
paper to the preprint server. Figure 9 also subdivides 
the astro-ph posted papers by those posted at submis- 
sion ( "Pre" and "Up" , solid line) , and those posted af- 
ter peer review and acceptance ("Post", dotted line). 
The papers with the earliest preprint postings show a 
marginally higher citation rate, which may simply reflect 
their slightly longer visibility time. Over long periods 
the two citation distributions are indistinguishable. We 
should note that in making this comparison we excluded 
5 preprints in the extreme tail of the citation distribution 
(^-lOO citations). Most of these were "Pre" postings of 
time-critical data (e.g., gamma-ray burst observations). 
This 2% of the sample is sufhcient to boost the citation 
rate of the entire "Pre" sample by nearly 20%. How- 
ever the difference appears to reflect the nature of these 
particular papers, and not the effects of preprint posting 
habits. We intend to update our data in the future to 
conflrm whether the posting time has neglible effect on 
the impact of the subsequently published paper. 

Table 6 also lists the distribution of citations for the 
various subcategories of papers. In all subdisciplines and 
classes the number of citations is significantly higher for 
papers submitted to astro-ph compared to their ApJ- 
only counterparts. However the magnitude of the differ- 
ence varies widely between subdisciplines, in ways that 
mirror the overall preprint posting pattern in those sub- 
fields. For example in cosmology, where 85% of ApJ au- 
thors post on astro-ph, the citation rates between posted 
and unposted papers differ by more than a factor of ten! 
It appears that papers in this field that are not posted 
on astro-ph are virtually ignored. In contrast, in the 
ISM field posting of preprints has only a small (-^30%) 
effect on citation rates, as compared to the factor-of- 
two average for all ApJ papers. This partly reflects the 
lower overall penetration of astro-ph into this subfield, 
and the availability of other electronic newsletters and 



alerting services for new papers parts of this field. 

7 Comparison with Non-Peer-Reviewed Papers 

We have shown that the increased visibility of papers 
afforded by preprint postings has a significant (factor-of- 
two) effect on the subsequent citations to those papers. 
How does this compare to the other factors that influ- 
ence the impact of an article? Citation statistics for the 
major journals are compiled by the ISI, and they show 
a dispersion of approximately a factor of two among ci- 
tation rates for the half-dozen major astronomy and as- 
trophysics journals, and roughly an order of magnitude 
range over all of the significant journals. So the change 
in impact from posting a paper on astro-ph is compara- 
ble to the differences in overall impact among the major 
journals. 

Less information is available on how publishing in 
a peer-reviewed journal overall infiuences a paper's im- 
pact, and how posting on astro-ph increases the visibility 
of non-peer-reviewed articles. To provide at least a rudi- 
mentary answer we used ADS to compile citation fre- 
quencies for 2673 papers that appeared in 31 conference 
proceedings published in 1999. We took pains to select a 
distribution of subdisciplines that mirrored the ApJ pa- 
per distribution for the same year, and ranged in visibil- 
ity from major symposia to smaller meetings. Between 
1999 and mid-2003, the same time base for the ApJ pa- 
per data shown in Table 6, these papers were cited a total 
of 2181 times, for a mean of 0.82 citations/paper. This 
compares to a mean of 16.4 citations/paper for the 1999 
ApJ papers, exactly 20 times higher. We find a simi- 
lar ratio when wc compare the oflic;ial ISI impac;t factors 
for the ApJ with those for lAU symposia volumes, so 
we believe that our methodology is robust. Similarly, 
Kurtz et al. (2004) foimd that 68% of 1995 ApJ papers 
were cited in the year 2000 while only 1.6% of Bulletin of 
the American Astronomical Society abstracts published 
1995 were cited in the 2000. 

In order to assess the impact of preprint posting on 
these articles, we selected a subset of the more highly 
cited proceedings, determined which papers had been 
posted as preprints on astro-ph, and compared citation 
rates as described earlier for ApJ papers. We found that 
posting a conference paper on astro-ph increased the im- 
pact of the subsequent paper by a factor of 2.2 on aver- 
age, nearly the same as the factor of 2.05 enhancement 
for ApJ papers. So preprint posting increases the relative 
visibility of non-peer-reviewed papers by a comparable 
factor, but the factor-of-20 difference between proceed- 
ings papers and ApJ papers remains the same regardless 
of whether the respective papers are posted on astro-ph 
or not. This should serve as a caution to anyone who 
might believe that preprint posting alone is sufficient to 
assure that a paper is widely recognized and cited. 

8 Discussion: Implications for Electronic Pub- 
lishing 

What lessons can we draw from these results? One 
implication is unmistakable- authors who wish to maxi- 
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mize the visibility of their papers should post their arti- 
cles to the large e-print servers such as astro-ph. Exactly 
when the paper is posted appears to have little eflFect on 
citation rates. Another lesson to be gleaned from these 
results is that as the pace of astronomical discovery has 
accelerated over the past decade, astronomers want to 
learn about new results as quickly as possible, rather 
than wait the additional weeks or months for final, edited 
versions of the results to appear. 

Although the use of astro-ph as an alerting service is 
rapidly achieving near-universal use in the astrophysics 
community, authors remain highly divided about the 
contents and timing of their postings. At this time the 
ApJ author community (that is, the community of au- 
thors who write papers that are eventually accepted) is 
roughly equally divided between those who use astro-ph 
as a posting service for accepted, peer-reviewed papers, 
and those who post papers before they are reviewed, 
either to establish priority or to solicit feedback from 
colleagues during the peer review cycle. These cultural 
differences are strongly polarized across subfields and be- 
tween observers and theorists. To some extent these pat- 
terns pre-date the era of electronic preprints, but the rel- 
ative convenience and low cost, worldwide dissemination 
of results that is offered by the e-print servers clearly 
has caused more c luthors to migrate toward the bulletin 
board model. It will be interesting to see whether this 
trend continues in the future. 

Our data document how thoroughly the astro-ph 
preprint server, over the time span of a decade, has sup- 
planted the departmental preprint shelves and the per- 
sonal mailings of preprints as the primary means that 
astronomers become aware of new papers in their field. 
One striking feature in Table 6 is the relative consis- 
tency of citation rates across subfields and types, when 
preprints of the papers are posted on astro-ph; the cita- 
tion frequencies rarely vary by more than ±20-30% of 
the average rate. In contrast, the citation rates for pa- 
pers that are never posted as preprints, apart from being 
a factor of two lower overall, fluctuate from field to field 
by more than a factor of five. As a larger fraction of 
papers is posted as preprints, the visibility of those re- 
maining papers that are not posted on e-print servers is 
sure to decline even further, as it has already in cosmol- 
ogy. Just as publishing in refereed journals is regarded 
as an essential prerequisite for establishing the credibil- 
ity and documenting an individual's or group's scientific 
research, posting this work on the arXive server is be- 
coming essential for disseminating that research to the 
largest possible audience. 

We should caution the reader that other factors 
probably contribute to the difference in citation fre- 
quency between preprint-posted and unposted papers. 
For example, authors with new results they believe to 
be of special significance are much more likely to post 
their results on astro-ph. The same is true for papers 
with particular time-critical value. These effects will al- 
ways cause pre-posted papers to be more highly cited 
on average, and without an independent means to rank 



paper quality it is impossible to disentangle them from 
the effcicts of increased visibility afforded by astro-ph. 

Given that e-print posting clearly is becoming a cen- 
tral factor influencing the visibility and citation of subse- 
quently published journal articles, does this mean that 
the journals themselves are becoming irrelevant in the 
process of scientific communication? We think not. Al- 
though the preprint servers are filling a vital function 
by dissemintating these articles quickly and efficiently, 
all of the other attributes of the papers that make them 
so valuable and citable are enforced by the peer review 
and the other editorial requirements of the respective 
journals. All of the citation data presented here refer to 
accepted and published ApJ papers, which were vetted 
by peer review and stringent standards of copyediting, 
bibliographic referencing, and data presentation. The 
corresponding preprints, regardless of when they were 
posted, were all prepared with the expectation of meet- 
ing these rigorous ApJ publication standards. In the ab- 
sence of such editorial standards and controls it is naive 
to expect that papers would continue to maintain this 
level of scientific quality, English presentation, and clar- 
ity of tables, figures, and referencing entirely on their 
own. If one wants to visualize what a fully open-access, 
self-reviewed literature in astronomy might look like, the 
conference proceedings discussed earlier provide an in- 
teresting analogy. Conference papers offer many of the 
the features of a free-publication system, with little or no 
peer review, minimal production standards, no copyedit- 
ing, and when posted on astro-ph virtually free distribu- 
tion and access, with equal visibility to journal articles. 
Nevertheless such papers arc cited only 5% as often as 
comparable ApJ papers, even when posted on astro-ph. 
To be fair the two sets of papers are usually intended 
for entirely different purposes, but the comparison un- 
derscores the critical role that the destination publishing 
source plays in dictating the quality and long-term value 
of their respective preprints. 

We believe instead that our study illustrates the 
strength of the symbiosis that currently exists between 
the major peer-reviewed journals, the arXive preprint 
server, and the NASA ADS system. The journals largely 
set the scientific and editorial standards that are repli- 
cated in much of the preprint literature, while the e-print 
servers have increasingly supplanted much of the role of 
the journals as the first access point to new research re- 
sults, with a publication model that embodies superb 
distribution efficiency and ease of use. Even if each jour- 
nal were able to replicate this efiiciency the advantages 
of a single consolodated source for preprints, covering 
all journals and other publications, clearly make it the 
model of choice for preprint distribution. Although any 
system of publishing can be improved, the vitality of as- 
tronomical publishing can be attributed to the effective 
combination of a family of peer-reviewed electronic jour- 
nals with and an efficient and user-friendly preprint dis- 
tribution system, and a powerful bibliographic database 
system at ADS linking all of these resources. 
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Table 1: Attributes and Acronym Dictionary 



Attribute 


Acronym 


Defination / Example 


Subdiscipline 


Cosmology 


C 


Galaxy formation, Cosmic Microwave Background, 






Hubble and cosmological constants 


Extra-Galactic 


EG 


High-redshift galaxies. Active Galactic Nuclei, 






InterGalactic Medium, galaxy clusters 


Milky Way 


MW 


Milky Way structure. Galactic center, globular clusters 


Galactic ISM 


ISM 


Galactic Super Nova remnants, InterStellar Medium, 






and star formation 


Stellar 


S 


All stars including Supernova and Gamma-Ray bursts 


Solar system 


SS 


Sun and solar system objects 


Other 





Instrumentation, atomic and nuclear proccesses 


Classifications 


Theoretical 


T 


Theory paper with no observations 


Observational 





New obseration paper 


Rediscussion 


R 


Paper discussing previous observations 


Laboratory 


L 


Laboratory or instrumentation 


Astro-ph preprint types 


PreApJ 


Pre 


One preprint posted before ApJ submission. 


PostApJ 


Post 


One preprint posted after ApJ acceptance. 


Updated 


Up 


Mulitple preprint submissions. 


Unknown 


Unk 


The preprint's posted date does not match either the 



ApJ submitted or accepted dates. 
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Tubk- 2: ApJ iroiids" 





1999 


2002 


Type 


Mean Median 


Mean Median 


Number of Authors 


4.0±0.1 3 


4.5±0.2 3 


Published page length 


11.4±0.2 10 


11.6±0.2 10 


Acceptance time in days 


177±5 142 


133±4 95 



"For the combined 1999 and 2002 data sets. 



Table 3: First Author Demographies' 



Type 


Sample 


Mean 


Median 


Total 


M/F 


Unk'' 




size"^ 


Age 


Age 


Papers 


Ratio 


(%) 


C 


50 


36.0±1.3 


35 


144 


10.4 


5 


EG 


235 


38.2±0.6 


36 


452 


4.2 


5 


MW 


39 


33.7±1.2 


31 


71 


3.8 


6 


ISM 


139 


40.0±1.0 


36 


249 


5.1 


5 


S 


256 


40.4±0.7 


38 


461 


6.2 


7 


ss 


88 


44.2±1.3 


42 


213 


7.4 


9 





14 


43.5±2.5 


46 


49 


13.0 


14 


T 


292 


39.1±0.6 


35 


711 


7.4 


8 





461 


40.1±0.5 


38 


775 


4.4 


4 


R 


64 


37.9±1.3 


36 


125 


4.9 


6 


L 


4 


42.8±3.2 


46 


28 




27 


All 


821 


39.G±0.4 


37 


1G39 


5.0 


6 



Note. — Sec Table 1 for explanations of subdiscipline and type codes. 
"For the combined 1999 and 2002 data sets. 
'Fraction of total sample that couldn't be assigned a gender. 
•^Number of papers where the first author's age is known. 



Tubk" 4; The astro-ph i)r(>i)riiit sul)iiiissioiis t vi)('s 



Time 


Total 


astro-ph 


pj.ga 


Post^ 


Unk^ 


Up^ 


(mm/yy) 




(%) 


(%) 


(%) 


(%) 


(%) 


07-12/99 


795 


61 


14 


68 


<1 


18 


02-03/01'' 


296 


73 


19 


64 


2 


16 


07-12/02 


844 


72 


11 


61 


<1 


27 



"See Table 1 for an explanation of astro-ph codes. 

''From original unpublished AAS Journal - astro-ph survey. 
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Tabic 5: astro-ph preprint submissions by subdiscipline 

and classification 

Grouping Total Pre Post Unk Up 
Papers (%) (%) (%) (%) 

Subdiscipline (1999) 

C 56 21 43 2 34 

EG 181 7 82 10 

MW 30 20 57 23 

ISM 51 8 76 2 14 

S 135 19 61 1 19 

SS 15 20 53 26 

O 16 38 44 19 

Subdiscipline (2002) 

G 75 25 23 52 

EG 201 6 69 25 

MW 26 15 61 23 

ISM 79 6 80 14 

S 194 11 59 1 29 

SS 26 12 77 4 8 

O 7 14 57 29 

Glassification (1999) 

T 214 22 50 1 26 

O 219 8 81 <1 11 

R 46 9 80 11 

L 5 100 

Glassification (2002) 

T 264 17 47 <1 36 

O 295 6 75 <1 19 

R 47 9 57 34 

L 2 100 
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Table 6: Distribution of ADS citations 







JrVlL J. JflJ.JV,i o 




astro-ph papers 






loLitJ-pil pfipi^in 










(IlMll 


^ Pajjcrs 


.Mean Ale 


tiiaii 


# Piipt^rs 




tiian 


Alia 


7Q^ 




1 n 


484 


20.5±1.2 


1 


oil 


1 n 0-1-0 9i. 


a 
o 










Subdiscipl 


ine 










p 


DO 


1 Q 7-l-Q A 




56 


22.5±3.8 


1 1 
± ± 


Q 


9 9-1-0 


Q 

o 


EG 


226 


IQ 

i . \J 1 ^ . V J 


13 


181 


21.8±2.4 


15 


45 


10.4=bl.4 


7 


iVi VV 




lo.oztz.y 


1 /I 

14.0 


30 


20.1±3.1 


iO 


A 


fi Q-UI 7 

D.ozti. / 


0.0 


ISM 


130 


12.7±1.3 


9 


51 


14.8±1.6 


10 


79 


11.4±1.8 


7 


S 


213 


17.5±1.6 


10 


135 


21.0±2.2 


11 


78 


11.3±2.1 


6.5 


ss 


94 


9.9±1.3 


6 


15 


16.1±3.4 


12 


79 


8.8±1.4 


6 





33 


11.0±2.4 


6 


16 


16.3±4.1 


14.5 


17 


6.0±1.8 


2 










Classification 










T 


360 


15.1±1.1 


9 


214 


19.4±1.7 


12 


146 


8.8±0.9 


5 





351 


16.8±1.4 


11 


219 


20.1±2.0 


14 


132 


11.2±1.5 


6 


T> 
JTL 


67 


23.2±3.3 


13 


46 


28.0±4.1 


± 1 .0 


91 


1 9 R-LA K 
IZ. 0314.0 


Q 

O 


L 


17 


7.4±1.9 


5 


5 


12.4±4.9 


12 


12 


5.3±1.6 


3.5 










Gender 












Male 


638 


17.2±1.0 


10 


397 


21.3±1.5 


13 


241 


10.5±1.0 


6 


Female 


112 


14.1±1.2 


11 


66 


17.2±1.7 


14.5 


46 


9.8±1.3 


6 










AAS 












Member 


352 


19.5±1.6 


11 


223 


23.9±2.3 


15 


130 


12.1±1.7 


7 


Nonmember 


443 


13.8±0.8 


9 


261 


17.6±1.1 


12 


181 


8.4±0.8 


5 










Country 










USA 


501 


17.7±1.2 


11 


313 


21.9±1.8 


14 


188 


10.6±1.1 


6 


Other 


294 


14.1±0.9 


8 


170 


17.8±1.3 


11.5 


124 


8.9±1.2 


5 


All 


795 


16.4±0.8 


10 


484 


20.5±1.2 


13 


311 


10.0±0.8 


6 


Note. — The 


mean and median values for the astro-ph 


types "Pre" + 


"Up" and "Post" 


are 24 


.8±3.2 and 15 


and 18.1±1.0 and 


12 



respectively. 
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Fig. 4. — Astro-ph submission percentages as a 

function of subdisciplinc (left figure) and classification 
(right figure) . The left and right bars of each column 
give the percentages in 1999 and 2002, respectively. 



Fig. 1. — Distribution in the ApJ by subdisciplinc for 
the combined 1999 and 2002 data sets. See Table 1 for 
an explanation of the subdisciplinc codes. 
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Fig. 2. — Classification distribution as a fimction of 
time. The first four points are from Abt (1993) 
excluding all but ApJ papers (see text). The last two 
points are from this study. The solid, dotted, dashed, 
and dot-dashed lines are the observational, theoretical, 
rediscussing, and laboratory papers, respectively 




USA Other MS Non-Member Male Female 



Fig. 3. — ApJ distribution percentages by the first 
author's country, AAS membership, and gender for the 
combined 1999 and 2002 data sets. 



Fig. 5. ADS citations as a function of published page 
length in 1999. The boxes (left ordinate) give the mean 
and la uncertainty range for each bin. The histogram 
of the distribution (right ordinate) is provide beneath 
the citation data. 
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Fig. 6. — ADS citations as a function of number of 
authors (left ordinate) and the citation distribution 
(right ordinate). The boxes give the mean and la 
uncertainty range for each author bin. 
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Fig. 7. — ADS citations as a function of first author 
age in 1999. The boxes (left ordinate) give the mean 
and 1(7 uncertainty range for each bin. The histogram 
of the distribution (right ordinate) is provide beneath 
the citation data. 
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Fig. 9. — Histogram of ADS citations as a function of 

astro-ph submission type. The soHd Unc are papers 
submitted to astro-ph at the same time as ApJ ( "Pre" 
and "Up"), the dotted hne are papers submitted to 
astro-ph after ApJ acceptance ( "Post" ) , and the 
dashed hne are papers never submitted to astro-ph. 
Five papers with anomalously high citations have been 
excluded from the statistics (see text). 



Fig. 8. — Histogram of ADS citations as a function of 
gender. The solid line is for males and the dotted line 
is for females. Both lines have been normalized by the 
total number of cites per gender. 
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